AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-13-2026, 12:16:39 GMT

HitNet: Hybrid Ternary Recurrent Neural Network

Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie

Recurrent Neural Networks (RNNs) yield great results across many natural language processing applications, including speech recognition, machine translation, language modeling, and question answering [1,2,3,4,5].

machine learning, natural language, quantization, (21 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 18:32:03 GMT

4aa13186c795a52ba88f5b822f4b77eb-Supplemental-Conference.pdf

dataset, log 1, see-d, (17 more...)

Country: Asia > Middle East > Israel (0.14)

Industry:

Banking & Finance (0.46)
Health & Medicine > Therapeutic Area (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Neural Information Processing SystemsNov-20-2025, 22:32:27 GMT

HitNet: Hybrid Ternary Recurrent Neural Network

full precision model, hitnet, hybrid ternary recurrent neural network, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 17:52:52 GMT

HitNet: Hybrid Ternary Recurrent Neural Network

Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie

As shown in Figure 1, quantizing weights into ternary values {-1, 0, 1} can only save 1.4

activation, quantization, quantization method, (14 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > Promising Solution (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Balasubramanian, Sriram, Basu, Samyadeep, Feizi, Soheil

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

arXiv.org Artificial IntelligenceNov-4-2025

Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We present the first comprehensive study of CoT faithfulness in large vision-language models (LVLMs), investigating how both text-based and previously unexplored image-based biases affect reasoning and bias articulation. Our work introduces a novel, fine-grained evaluation pipeline for categorizing bias articulation patterns, enabling significantly more precise analysis of CoT reasoning than previous methods. This framework reveals critical distinctions in how models process and respond to different types of biases, providing new insights into LVLM CoT faithfulness. Our findings reveal that subtle image-based biases are rarely articulated compared to explicit text-based ones, even in models specialized for reasoning. Additionally, many models exhibit a previously unidentified phenomenon we term ``inconsistent'' reasoning - correctly reasoning before abruptly changing answers, serving as a potential canary for detecting biased reasoning from unfaithful CoTs. We then apply the same evaluation pipeline to revisit CoT faithfulness in LLMs across various levels of implicit cues. Our findings reveal that current language-only reasoning models continue to struggle with articulating cues that are not overtly stated.

large language model, machine learning, natural language, (21 more...)

2505.23945

Country: North America (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-14-2025, 16:12:58 GMT

A Broader Impact and Limitation Discussion

We provide all missing proofs in this section. We prove the statement by contradiction. Next we show the proof for the second half. Now we show the last piece of the statement by construction. We prove the statement via three main steps.

dataset, log 1, probability, (17 more...)

Country: Asia > Middle East > Israel (0.14)

Industry:

Banking & Finance (0.46)
Health & Medicine > Therapeutic Area (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Qu, Helen, Xie, Sang Michael

Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models

arXiv.org Artificial IntelligenceJul-11-2025

CLIP and large multimodal models (LMMs) have better accuracy on examples involving concepts that are highly represented in the training data. However, the role of concept combinations in the training data on compositional generalization is largely unclear -- for instance, how does accuracy vary when a common object appears in an uncommon pairing with another object? In this paper, we investigate how word co-occurrence statistics in the pretraining dataset (a proxy for co-occurrence of visual concepts) impacts CLIP/LMM performance. To disentangle the effects of word co-occurrence frequencies from single-word frequencies, we measure co-occurrence with pointwise mutual information (PMI), which normalizes the joint probability of two words co-occurring by the probability of co-occurring independently. Using synthetically generated images with a variety of concept pairs, we show a strong correlation between PMI in the CLIP pretraining data and zero-shot accuracy in CLIP models trained on LAION-400M (r=0.97 and 14% accuracy gap between images in the top and bottom 5% of PMI values), demonstrating that even accuracy on common concepts is affected by the combination of concepts in the image. Leveraging this finding, we reproduce this effect in natural images by editing them to contain pairs with varying PMI, resulting in a correlation of r=0.75. Finally, we demonstrate that this behavior in CLIP transfers to LMMs built on top of CLIP (r=0.70 for TextVQA, r=0.62 for VQAv2). Our findings highlight the need for algorithms and architectures that improve compositional generalization in multimodal models without scaling the training data combinatorially. Our code is available at https://github.com/helenqu/multimodal-pretraining-pmi.

large language model, machine learning, natural language, (19 more...)

2507.08

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bordt, Sebastian, Srinivas, Suraj, Boreiko, Valentyn, von Luxburg, Ulrike

How much can we forget about Data Contamination?

arXiv.org Artificial IntelligenceOct-25-2024

The leakage of benchmark data into the training data has emerged as a significant challenge for evaluating the capabilities of large language models (LLMs). In this work, we use experimental evidence and theoretical estimates to challenge the common assumption that small-scale contamination renders benchmark evaluations invalid. First, we experimentally quantify the magnitude of benchmark overfitting based on scaling along three dimensions: The number of model parameters (up to 1.6B), the number of times an example is seen (up to 144), and the number of training tokens (up to 40B). We find that if model and data follow the Chinchilla scaling laws, minor contamination indeed leads to overfitting. At the same time, even 144 times of contamination can be forgotten if the training data is scaled beyond five times Chinchilla, a regime characteristic of many modern LLMs. We then derive a simple theory of example forgetting via cumulative weight decay. It allows us to bound the number of gradient steps required to forget past data for any training run where we know the hyperparameters of AdamW. This indicates that many LLMs, including Llama 3, have forgotten the data seen at the beginning of training. Experimentally, we demonstrate that forgetting occurs faster than what is predicted by our bounds. Taken together, our results suggest that moderate amounts of contamination can be forgotten at the end of realistically scaled training runs.

large language model, machine learning, natural language, (17 more...)

2410.03249

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Okanovic, Patrik, Kirsch, Andreas, Kasper, Jannes, Hoefler, Torsten, Krause, Andreas, Gürel, Nezihe Merve

All models are wrong, some are useful: Model Selection with Limited Labels

arXiv.org Artificial IntelligenceOct-24-2024

We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained classifiers. Given a pool of unlabeled target data, MODEL SELECTOR samples a small subset of highly informative examples for labeling, in order to efficiently identify the best pretrained model for deployment on this target dataset. Through extensive experiments, we demonstrate that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model. Across 18 model collections on 16 different datasets, comprising over 1,500 pretrained models, MODEL SELECTOR reduces the labeling cost by up to 94.15% to identify the best model compared to the cost of the strongest baseline. Our results further highlight the robustness of MODEL SELECTOR in model selection, as it reduces the labeling cost by up to 72.41% when selecting a near-best model, whose accuracy is only within 1% of the best model.

artificial intelligence, machine learning, model selection, (14 more...)

2410.13609

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)